Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

92 ◾ Bioinformatics

k-1 are connected by an edge. The contiguous reads are merged by finding Eulerian path,

which includes every edge exactly once. Most modern assemblers like ABySS and SPAdes

use de Bruijn graphs method because it is less complex than the overlap graphs. The qual-

ity of the de novo assembly is greatly affected by the value of the parameter k (number of

bases). For instance, assume that we have the read TCTACTAAGCTAACATGCAATGCA.

When we implement de Bruijn algorithm in any programming languages, we can obtain

the graphs showing the nodes (k-mers) and the edges as shown in Figure 3.5.

For both overlap and de Bruijn graphs, once we have constructed the graphs, the next

step is to find the relevant paths that are likely to represent the assembly. There should be a

single path for the assembly if there is no error or sequence repeats, which is rare in most

cases.

FIGURE 3.4 The overlaps and consensus sequence.

FIGURE 3.3 Overlap graphs and Hamiltonian path.